Corpus Creation and Initial SMT Experiments between Spanish and Shipibo-konibo
نویسندگان
چکیده
In this paper, we present the first attempts to develop a machine translation (MT) system between Spanish and Shipibo-konibo (es-shp). There are very few digital texts written in Shipibo-konibo and even less bilingual texts that can be aligned, hence we had to create a parallel corpus using both bilingual and monolingual texts. We will describe how this corpus was made, as well as the process we followed to improve the quality of the sentences used to build a statistical MT model or SMT. The results obtained surpassed the baseline proposed (dictionary based) and made a promising result for further development considering the size of corpus used. Finally, it is expected that this MT system can be reinforced with the use of additional linguistic rules and automatic language processing functions that are being implemented.
منابع مشابه
High Prevalence of Human T-Lymphotropic Virus Infection in Indigenous Women from the Peruvian Amazon
BACKGROUND In an earlier study, we detected an association between human T-cell lymphotropic virus (HTLV) infection and cervical human papillomavirus (HPV) in indigenous Amazonian Peruvian women of the Shipibo-Konibo ethnic group. As both HTLV and HPV can be transmitted sexually, we now report a population-based study examining the prevalence and risk factors for HTLV-1 and HTLV-2 infection in ...
متن کاملAssociation between Human Papillomavirus and Human T-Lymphotropic Virus in Indigenous Women from the Peruvian Amazon
BACKGROUND No association between the Human T-cell lymphotropic virus (HTLV), an oncogenic virus that alters host immunity, and the Human Papillomavirus (HPV) has previously been reported. Examining the association between these two viruses may permit the identification of a population at increased risk for developing cervical cancer. METHODS AND FINDINGS Between July 2010 and February 2011, ...
متن کاملSpell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language
There are several native languages in Peru which are mostly agglutinative. These languages are transmitted from generation to generation mainly in oral form, causing different forms of writing across different communities. For this reason, there are recent efforts to standardize the spelling in the written texts, and it would be beneficial to support these tasks with an automatic tool such as a...
متن کاملCultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis
This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...
متن کاملEvaluating Indirect Strategies for Chinese - Spanish Statistical Machine Translation: Extended Abstract
Although, Chinese and Spanish are two of the most spoken languages in the world, not much research has been done in machine translation for this language pair. This paper focuses on investigating the state-of-the-art of Chinese-to-Spanish statistical machine translation (Smt), which nowadays is one of the most popular approaches to machine translation. For this purpose, we report details of the...
متن کامل